Implementations of the table data structure
Collision likelihoods and load factors for hash tables
A simple Hash Table in operation
Double hashing is a technique used to resolve clustering issues that can occur with linear probing in hash tables. When we insert elements into a hash table using linear probing, collisions can cause elements to cluster together, leading to inefficiencies in search and insertion operations. Double hashing addresses this problem by employing a secondary hash function to determine alternative positions for elements when collisions occur.
Let's break down how double hashing works with a simple example. Imagine we have a hash table with 11 slots, and we want to insert the following keys: PHL, ORY, GCM, HKG, GLA, AKL, FRA, LAX, DCA, BHX.
We start by applying a primary hash function to determine the initial position for each key in the hash table. For simplicity, let's use the modulo operator with 11 as the divisor:
h1(key) = key % 11
The results for our keys are as follows:
PHL -> 3
ORY -> 1
GCM -> 1 (collision with ORY)
HKG -> 3 (collision with PHL)
GLA -> 9
AKL -> 2
FRA -> 6
LAX -> 7
DCA -> 2 (collision with AKL)
BHX -> 3 (collision with PHL and HKG)
When collisions occur, we apply a secondary hash function to determine the step size or "jump" to find an alternative position for the key. In this example, we'll use a secondary hash function based on integer division:
h2(key) = (key / 11) % 11
However, to handle the possibility of getting 0 from the secondary hash function, we adjust it slightly:
h2(key) = (key / 11) % 11 if (key / 11) % 11 != 0
1 otherwise
We compute the secondary hash values for our keys:
PHL -> 4 ORY -> 1 GCM -> 1 HKG -> 3 GLA -> 9 AKL -> 2 FRA -> 6 LAX -> 7 DCA -> 2 BHX -> 3
Insert HKG: Since HKG collides with PHL, we use h2(HKG) = 3 as the step size. We start from the original position and try every third location to the left until we find an empty slot. We successfully insert HKG without creating a block.
Insert GLA: GLA collides with ORY. With h2(GLA) = 9, we try every ninth location, which takes us to the last slot.
Insert DCA and BHX: Both DCA and BHX collide with PHL and HKG. We use h2(DCA) = 2 and h2(BHX) = 3 to find alternative positions. This process continues until all keys are inserted.
By using double hashing, we effectively distribute keys across the hash table, minimizing clustering and avoiding the formation of large blocks of consecutive occupied slots. Additionally, the direction in which we search for alternative positions (left or right) is a convention and can be chosen based on implementation preferences.
Now, let's discuss the search complexity of double hashing. Unlike linear probing, computing the efficiency of double hashing is more challenging, and we won't delve into the derivation here. However, it's important to note that with a load factor λ, successful searches require approximately (1/λ)ln(1/(1-λ)) comparisons on average, while unsuccessful searches require approximately 1/(1-λ) comparisons. Despite the complexity of these calculations, the time complexity for search operations remains constant (O(1)), making double hashing an efficient technique for retrieving elements from the hash table.
In conclusion, double hashing provides an effective solution to clustering issues in hash tables by employing a secondary hash function to resolve collisions. By distributing keys more evenly throughout the table, double hashing improves the efficiency of search and insertion operations, making it a valuable tool in the implementation of hash-based data structures.
Double hashing is a collision resolution technique used in hash tables. It involves applying two hash functions: a primary one to determine the initial slot, and a secondary one to calculate an alternative position in case of collisions. The secondary hash function guides the search for an empty slot by specifying the step size. If a collision occurs, the algorithm iterates through slots according to the secondary hash function until an empty slot is found. This approach helps distribute elements more evenly across the hash table, reducing clustering and improving the efficiency of search and insertion operations.